{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (c) Microsoft Corporation. All rights reserved.  \n",
    "\n",
    "Licensed under the MIT License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train, hyperparameter tune and convert a PyTorch model into ONNX\n",
    "\n",
    "In this tutorial, you will train, hyperparameter tune, convert the model to ONNX and register a PyTorch model using the Azure Machine Learning (Azure ML) Python SDK. Additionally you will learn how to leverage ML pipelines to create a reproducible training process.\n",
    "\n",
    "This tutorial will train an image classification model using transfer learning, based on PyTorch's [Transfer Learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). The model is trained to classify chickens and turkeys by first using a pretrained ResNet18 model that has been trained on the [ImageNet](http://image-net.org/index) dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import azureml.core\n",
    "from azureml.core import Workspace, Experiment\n",
    "from azureml.core.datastore import Datastore\n",
    "from azureml.core.compute import ComputeTarget, AmlCompute\n",
    "from azureml.exceptions import ComputeTargetException\n",
    "from azureml.data.data_reference import DataReference\n",
    "from azureml.pipeline.steps import HyperDriveStep, HyperDriveStepRun\n",
    "from azureml.pipeline.core import Pipeline, PipelineData\n",
    "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n",
    "from azureml.train.hyperdrive import choice, loguniform\n",
    "from azureml.pipeline.core import PipelineParameter\n",
    "from azureml.pipeline.steps import PythonScriptStep\n",
    "from azureml.pipeline.core import Pipeline, PipelineRun\n",
    "\n",
    "import os\n",
    "import shutil\n",
    "import urllib\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "# Check core SDK version number\n",
    "\n",
    "print(\"SDK version:\", azureml.core.VERSION)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialize workspace\n",
    "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.core.workspace import Workspace\n",
    "\n",
    "ws = Workspace.from_config()\n",
    "print('Workspace name: ' + ws.name, \n",
    "      'Azure region: ' + ws.location, \n",
    "      'Subscription id: ' + ws.subscription_id, \n",
    "      'Resource group: ' + ws.resource_group, sep='\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download training data\n",
    "The dataset we will use (located on a public blob [here](https://msdocsdatasets.blob.core.windows.net/pytorchfowl/fowl_data.zip) as a zip file) consists of about 120 training images each for turkeys and chickens, with 100 validation images for each class. The images are a subset of the [Open Images v5 Dataset](https://storage.googleapis.com/openimages/web/index.html). The unzipped files are in provided in the repository. The cell below you can learn how to easily upload your data into a datastore for traceability. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get the default workspace\n",
    "data_dir = ws.get_default_datastore()\n",
    "\n",
    "# upload data\n",
    "#data_dir.upload('data', overwrite=True, show_progress=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create or Attach existing AmlCompute\n",
    "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you will create a cpu cluster or select an existing one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.core.compute import ComputeTarget, AmlCompute\n",
    "from azureml.core.compute_target import ComputeTargetException\n",
    "\n",
    "# choose a name for your cluster\n",
    "cluster_name = \"cpu-cluster\"\n",
    "\n",
    "try:\n",
    "    compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
    "    print('Found existing compute target.')\n",
    "except ComputeTargetException:\n",
    "    print('Creating a new compute target...')\n",
    "    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D3_V2', \n",
    "                                                           max_nodes=4)\n",
    "    # create the cluster\n",
    "    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
    "\n",
    "    compute_target.wait_for_completion(show_output=True)\n",
    "\n",
    "# use get_status() to get a detailed status for the current cluster. \n",
    "# print(compute_target.get_status().serialize())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train model on the remote compute\n",
    "Now that you have your data and training script prepared, you are ready to train on your remote compute cluster."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a project directory\n",
    "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "#create the project directory \n",
    "script_folder = './pytorch-birds'\n",
    "os.makedirs(script_folder, exist_ok=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare training script\n",
    "Now you will need to create your training script. In this tutorial, the training script is already provided for you at `pytorch_train.py`. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.\n",
    "\n",
    "However, if you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script. \n",
    "\n",
    "In `pytorch_train.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML `Run` object within the script:\n",
    "```Python\n",
    "from azureml.core.run import Run\n",
    "run = Run.get_context()\n",
    "```\n",
    "Further within `pytorch_train.py`, we log the learning rate and momentum parameters, and the best validation accuracy the model achieves:\n",
    "```Python\n",
    "run.log('lr', np.float(learning_rate))\n",
    "run.log('momentum', np.float(momentum))\n",
    "\n",
    "run.log('best_val_acc', np.float(best_acc))\n",
    "```\n",
    "These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once your script is ready, copy the training script `pytorch_train.py` into your project directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#copy the training script into the project directory \n",
    "shutil.copy( 'pytorch_train.py', script_folder)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a PyTorch estimator\n",
    "The Azure ML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch). The following code will define a single-node PyTorch job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "dnn-pytorch-remarks-sample"
    ]
   },
   "outputs": [],
   "source": [
    "from azureml.train.dnn import PyTorch\n",
    "\n",
    "#create a Pytorch estimator \n",
    "est = PyTorch(source_directory=script_folder, \n",
    "                    compute_target=compute_target,\n",
    "                    entry_script='pytorch_train.py',\n",
    "                    pip_packages=['pillow==5.4.1'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.pipeline.core import PipelineData\n",
    "\n",
    "output = PipelineData(\"output\", datastore=data_dir)\n",
    "# define the input and output\n",
    "input_data = DataReference(\n",
    "        datastore=data_dir,\n",
    "        data_reference_name=\"input_data\",\n",
    "        path_on_datastore=\"fowl_data\")\n",
    "                         "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `estimator_entry_script_arguments` parameter is contains the arguments to your training script `entry_script`. Please note the following:\n",
    "- We passed our training data reference `input_data` to our script's `--data_dir` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the training data `fowl_data` on our datastore.\n",
    "- The `outputs` directory is specially treated by Azure ML in that all the content in this directory gets uploaded to your workspace as part of your run history. The files written to this directory are therefore accessible even once your remote run is over. In this tutorial, we will save our trained model to this output directory in the training script.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a hyperparameter step in an ML pipeline\n",
    "Now that we've created a estimator let's get the most accurate model by using Azure Machine Learning's hyperparameter tuning capabilities. You will also learn how to create a pipeline step for an ML pipeline. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Start a hyperparameter sweep\n",
    "First, we will define the hyperparameter space to sweep over. Since our training script uses a learning rate schedule to decay the learning rate every several epochs, let's tune the initial learning rate and the momentum parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (`best_val_acc`).\n",
    "\n",
    "Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `best_val_acc` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `10` epochs (`delay_evaluation=10`).\n",
    "Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, uniform, PrimaryMetricGoal\n",
    "\n",
    "param_sampling = RandomParameterSampling( {\n",
    "        'learning_rate': uniform(0.0005, 0.005),\n",
    "        'momentum': uniform(0.9, 0.99)\n",
    "    }\n",
    ")\n",
    "\n",
    "early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)\n",
    "\n",
    "hd_config = HyperDriveConfig(estimator=est,\n",
    "                             hyperparameter_sampling=param_sampling, \n",
    "                             policy=early_termination_policy,\n",
    "                             primary_metric_name='best_val_acc',\n",
    "                             primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
    "                             max_total_runs=8,\n",
    "                             max_concurrent_runs=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Here you will learn to create a hyperdrive step for part of an ML pipeline "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# let's create the hyperdrive step \n",
    "# input_data = DataReference(\n",
    "#         datastore=data_dir,\n",
    "#         data_reference_name=\"input_data\",\n",
    "#         path_on_datastore=\"fowl_data\")\n",
    "\n",
    "metrics_output_name = 'metrics_output'\n",
    "metrics_data = PipelineData(name='metrics_data',\n",
    "                            datastore=data_dir,\n",
    "                            pipeline_output_name=metrics_output_name)\n",
    "\n",
    "hd_step_name='Hyperdrive_step'\n",
    "hd_step = HyperDriveStep(\n",
    "    name=hd_step_name,\n",
    "    hyperdrive_config=hd_config,\n",
    "    estimator_entry_script_arguments=[\"--data_dir\", input_data, \"--num_epochs\", 5],\n",
    "    inputs=[input_data],\n",
    "    #outputs=[output],\n",
    "    metrics_output=metrics_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run an ML pipeline\n",
    "Finally, lauch the hyperparameter tuning job as part of an ML pipeline. To learn more about how to create a full ML pipeline for a training process (with steps such as data preperation), check out these [notebook samples](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/machine-learning-pipelines)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "exp = Experiment(workspace=ws, name='Hyperdrive_sample')\n",
    "pipeline1 = Pipeline(workspace=ws, steps=[hd_step])\n",
    "pipeline_run = exp.submit(pipeline)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Find the best model\n",
    "Once all the runs complete, we can find the run that produced the model with the highest accuracy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hd_step_run = HyperDriveStepRun(step_run=pipeline_run.find_step_run(hd_step_name)[0])\n",
    "best_run = hd_step_run.get_best_run_by_primary_metric()\n",
    "best_run"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Convert to ONNX\n",
    "Now that we have the best model, let's convert the model to ONNX to optimize the model for scoring. For more information about ONNX see details [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-onnx)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "#download the model \n",
    "best_run.download_file('outputs/model.pt', output_file_path='model.pt')\n",
    "\n",
    "best_run_model = torch.load('model.pt')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# export the model as onnx \n",
    "import torch.onnx\n",
    "\n",
    "# Standard ImageNet input - 3 channels, 224x224,\n",
    "# values don't matter as we care about network structure.\n",
    "# But they can also be real inputs.\n",
    "dummy_input = torch.randn(1, 3, 224, 224)\n",
    "\n",
    "# Invoke export\n",
    "torch.onnx.export(best_run_model, dummy_input, \"model.onnx\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import onnx \n",
    "\n",
    "# confirm successful export\n",
    "onnx_model = onnx.load('model.onnx')\n",
    "\n",
    "# Check that the IR is well formed\n",
    "onnx.checker.check_model(onnx_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# by registering the model you can then trigger devops release pipeline which will package and deploy your model\n",
    "from azureml.core.model import Model\n",
    "\n",
    "model = Model.register(workspace = ws,\n",
    "                       model_path = \"model.onnx\",\n",
    "                       model_name = \"onnx-birds\")\n",
    "\n",
    "print(model.name, model.id, model.version, sep = '\\t')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next Steps\n",
    "Create a devops pipeline to trigger a release, everytime a new model is created"
   ]
  }
 ],
 "metadata": {
  "authors": [
   {
    "name": "ninhu"
   }
  ],
  "category": "training",
  "compute": [
   "AML Compute"
  ],
  "datasets": [
   "ImageNet"
  ],
  "deployment": [
   "Azure Container Instance"
  ],
  "exclude_from_index": false,
  "framework": [
   "PyTorch"
  ],
  "friendly_name": "Training with hyperparameter tuning using PyTorch",
  "index_order": 1,
  "kernelspec": {
   "display_name": "Python 3.6 - AzureML",
   "language": "python",
   "name": "python3-azureml"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  },
  "tags": [
   "None"
  ],
  "task": "Train an image classification model using transfer learning with the PyTorch estimator"
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
